Chromatin Immunoprecipitation Sequencing ◾ 241
Figure 6.16 shows the dot plot of the first ChIP-Seq sample.
The IDs, descriptions, and statistics of the significant GO terms are stored in “chip1_
GO.csv”, “chip2_GO.csv”, and “chip3_GO.csv”. In Figure 6.16, we can notice that those top
ten GO terms are associated with gene transcription which reflects the Poly II biological
activity. The definitions of the GO terms can be searched at “http://www.informatics.jax.
org/vocab/gene_ontology/”. Thus, ChIP-Seq provides information about the functions of
the protein studied.
We can also use KEGG database for gene pathways to annotate the genes with signifi-
cant peaks. The “enrichKEGG()” function returns the enrichment KEGG categories with
FDR control. The following codes generate KEGG signaling pathway annotation and cre-
ate dot plot for each sample (Figure 6.17):
ekegg1 <- enrichKEGG(gene = entrez1, organism = ‘hsa’,
pvalueCutoff = 0.05)
cluster_kegg1 <- data.frame(ekegg1)
write.csv(cluster_kegg1, “kegg_chip1.csv”)
dotplot(ekegg1)
#Chip2
ekegg2 <- enrichKEGG(gene = entrez2, organism = ‘hsa’,
pvalueCutoff = 0.05)
cluster_kegg2 <- data.frame(ekegg2)
write.csv(cluster_kegg2, “kegg_chip2.csv”)
dotplot(ekegg2)
#Chip3
ekegg3 <- enrichKEGG(gene = entrez3, organism = ‘hsa’,
pvalueCutoff = 0.05)
cluster_kegg3 <- data.frame(ekegg3)
write.csv(cluster_kegg3, “kegg_chip3.csv”)
dotplot(ekegg3)
The significant KEGG signaling pathways show the most likely active pathways in the cells.
We can also compare enrichment across samples by using “compareCluste()” function,
which requires the list of genes from each sample (Figure 6.18).
# Create a list with genes from each sample
genes = lapply(annotated_peaks, function(i) as.data.
frame(i)$geneId)
# Run KEGG analysis
compKEGG <- compareCluster(geneCluster = genes,
fun = “enrichKEGG”,
organism = “human”,
pvalueCutoff = 0.05,
pAdjustMethod = “BH”)
dotplot(compKEGG, showCategory = 10, title = “KEGG Pathway
Enrichment Analysis”)